Reverse Engineering Top-k Database Queries with PALEO
نویسندگان
چکیده
Ranked lists are an essential methodology to succinctly summarize outstanding items, computed over database tables or crowdsourced in dedicated websites. In this work, we address the problem of reverse engineering top-k queries over a database, that is, given a relation R and a sample topk result list, our approach, named PALEO, aims at determining an SQL query that returns the provided input result when executed over R. The core problem consists of finding predicates of the where clause that return the given items, determining the correct ranking criteria, and to evaluate the most promising candidate queries first. To capture cases where only a sample of R is available or when R is different to the relation that indeed generated the input, we put forward a probabilistic model that allows assessing the chance of a query to output tuples that are resembling or are somewhat close to the input data. We further propose an iterative candidate query execution to further eliminate unpromising queries before being executed. We report on the results of a comprehensive performance evaluation using data and queries of the TPC-H and SSB [14] benchmarks.
منابع مشابه
Reverse Engineering Top-k Join Queries
Ranked lists have become a fundamental tool to represent the most important items taken from a large collection of data. Search engines, sports leagues and e-commerce platforms present their results, most successful teams and most popular items in a concise and structured way by making use of ranked lists. This paper introduces the PALEO-J framework which is able to reconstruct top-k database q...
متن کاملExploring Databases via Reverse Engineering Ranking Queries with PALEO
A novel approach to explore databases using ranked lists is demonstrated. Working with ranked lists, capturing the relative performance of entities, is a very intuitive and widely applicable concept. Users can post lists of entities for which explanatory SQL queries and full result lists are returned. By refining the input, the results, or the queries, user can interactively explore the databas...
متن کاملIdentifying the Most Influential Data Objects with Reverse Top-k Queries
Top-k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top-k queries leads to a query type that instead returns the set of customers that find a product appealing (it belon...
متن کاملEstimation of Potential Product Using Reverse Top-k Queries
Atpresent, most of the applications return to the user a limited set of ranked results based on the individual user’s preferences, which are commonly validated through top-k queries. From the perspective of a manufacturer, it is imperative that the products appear in the highest ranked positions for many different user preferences, otherwise the product is not visible to the potential customers...
متن کاملAnswering Why-not Questions on Reverse Top-k Queries
Why-not questions, which aim to seek clarifications on the missing tuples for query results, have recently received considerable attention from the database community. In this paper, we systematically explore why-not questions on reverse top-k queries, owing to its importance in multi-criteria decision making. Given an initial reverse top-k query and a missing/why-not weighting vector set Wm th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016